A Dynamically Reconfigurable Model for a Distributed Web Crawling System
نویسندگان
چکیده
A web crawling system using a distributed architecture needs to coordinate the whole system when the nodes in the system change. This paper presents an efficiently dynamic reconfigurability model that can be used in such a system. Through analyzing the model, we got methods to achieve the optimized performance in the distributed web crawling system, i.e., retain load balance and produce low network traffic in the system. Currently this dynamic reconfigurability model is being introduced in perfecting WebGather, a well-known Chinese and English web search engine. In addition, we believe that the model can also be useful in other web crawling system adopting a distributed architecture.
منابع مشابه
A maintenance system model for optimal reconfigurable vibrating screen management
The reconfigurable vibrating screen (RVS) machine is an innovative beneficiation machine designed for screening different mineral particles of varying sizes and volumes required by the customers’ through the geometric transformation of its screen structure. The successful RVS machine upkeep requires its continuous, availability, reliability and maintainability. The RVS machine downtime, which c...
متن کاملTowards Distributed Web Mining in Net-Enabled Enterprises
In today’s information age, web sites have become an important source for business information collection and analysis. They provide a company abundant information for competitor analysis and business intelligence. Also, web mining on a firm’s intranet can greatly assist a firm’s endeavor in knowledge management of a firm. However, web mining is a complex and resource-consuming process that con...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملService Description in a Distributed Search and Advertising System
Service description in a distributed system allows system components to know about one another and to make intelligent decisions regarding request routing and propagation. The paper discusses service description model used in a distributed system for Web search and search-based advertising. A service description consists of content description (terms), attribute set, and a number of service par...
متن کاملScale-Adaptable Recrawl Strategies for DHT-Based Distributed Web Crawling System
Large scale distributed Web crawling system using voluntarily contributed personal computing resources allows small companies to build their own search engines with very low cost. The biggest challenge for such system is how to implement the functionalities equivalent to that of the traditional search engines under a fluctuating distributed environment. One of the functionalities is incremental...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001